1. Import of 2016 data and libraries

library(tidyverse)
library(plotly)
library(scales)
library(foreign)
library(psych)

X2016 <- as_tibble(read.spss("C:/Users/User/OneDrive/Desktop/rmijankyal/Midtermproj/2016.sav", to.data.frame = TRUE))

2. Visualisation of expenditures

Frequency of expenditures

histogram <-  ggplot(X2016) + labs(y = "Frequency", x = "Expenditure")+ geom_histogram(aes(x = expend), fill = 'blue',bins = 100)
ggplotly(histogram)

Density curves of expenditures and totincome

density <- ggplot(data = X2016) + 
  labs(title = "Density of Expenditure & Totincome", y = "Density", x = "Expenditure, Totincome") +
    geom_density(aes(x = expend, y = ..density..)) + geom_density(aes(x=totincome, y=..density..), color="red")

ggplotly(density)

Boxpllot of expenditures

boxplot <- ggplot(data = X2016) + labs(title = "Boxplot of expenditures", y = "Expenditure") +
  geom_boxplot(aes(y = expend))

ggplotly(boxplot)

The graphs show that the distribution of expenditures is not normal, as we have got a positive skewness and a very high value of kurtosis. So most of the outliers are fallen on the right side of the graph.

3. Descriptive statistics

Desc.stat for expendirures:

library(pastecs)
options(scipen = 100)
options(digits = 2)
stat.desc(X2016$expend)
##        nbr.val       nbr.null         nbr.na            min            max 
##        5184.00           0.00           0.00        8186.49     2923503.83 
##          range            sum         median           mean        SE.mean 
##     2915317.34   742300247.45      116002.20      143190.63        1906.53 
##   CI.mean.0.95            var        std.dev       coef.var 
##        3737.60 18843019184.87      137269.88           0.96
library(psych)
describe(X2016$expend)
##    vars    n   mean     sd median trimmed   mad  min     max   range skew
## X1    1 5184 143191 137270 116002  123986 72171 8186 2923504 2915317  8.5
##    kurtosis   se
## X1      131 1907

Desc.stat for totincome:

library(pastecs)
options(scipen = 100)
options(digits = 2)
stat.desc(X2016$totincome)
##        nbr.val       nbr.null         nbr.na            min            max 
##        5184.00           3.00           0.00           0.00     2370500.00 
##          range            sum         median           mean        SE.mean 
##     2370500.00  1013060788.91      150267.07      195420.68        2476.32 
##   CI.mean.0.95            var        std.dev       coef.var 
##        4854.63 31789126976.58      178295.06           0.91
library(psych)
describe(X2016$totincome)
##    vars    n   mean     sd median trimmed    mad min     max   range skew
## X1    1 5184 195421 178295 150267  164570 106473   0 2370500 2370500  3.4
##    kurtosis   se
## X1       20 2476

_ As we can see in the results mean of total income is higher than mean of expenditures, which means in 2016 there actually have been savings in Armenia too. In my further research we will see that in many cases households’ expenditures are higher than total income though.
_ Although mean of total income is higher than that of expenditures, the range of expenditures are higher than the one of total income. This means that the dispersion of expenditures is higher than the one of totincome.
_ As we can also see in the density graphs, the skewness of expenditure exceeds the one of totincome. Both of them have positive skewness that is heavy tail on the right side.
_ It is clear that from the graph that the kurtosis of totincome is much smaller than that of expenditures, which means the variance of totincome is higher than that of expenditures. In fact our descriptive-statistics data confirms that and we get almost 3 times higher variance for totincome.

4. T-test: are the average household expenditures in Armenia euqal to mean of totincome?

\(H_0\): The average household expenditures in Armenia are 195421 AMD.
\(H_1\): The average household expenditures in Armenia are not equal to 195421 AMD (that is the mean of totincome).

t.test(x = X2016$expend, mu = mean(X2016$totincome), alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  X2016$expend
## t = -27, df = 5183, p-value <0.0000000000000002
## alternative hypothesis: true mean is not equal to 195421
## 95 percent confidence interval:
##  139453 146928
## sample estimates:
## mean of x 
##    143191

As we have got a very low p-value very close to 0, we reject our null hypothesis, which means the average household expenditures are not equal to mean of totincome.

5. Independent t-test

\(H_0\): The average household expenditures in different types of settlements in Armenia are equal.
\(H_1\): The average household expenditures in different types of settlements in Armenia are not equal.

X2016<-X2016%>%mutate(sett_new=case_when((settlement=="Yerevan"|settlement=="other urban")~"urban", settlement=="rural"~"rural"))
table(X2016$settlement, X2016$sett_new)
##              
##               rural urban
##   Yerevan         0  1404
##   other urban     0  1836
##   rural        1944     0
t.test(X2016$expend~X2016$sett_new)
## 
##  Welch Two Sample t-test
## 
## data:  X2016$expend by X2016$sett_new
## t = -4, df = 5058, p-value = 0.0002
## alternative hypothesis: true difference in means between group rural and group urban is not equal to 0
## 95 percent confidence interval:
##  -20353  -6189
## sample estimates:
## mean in group rural mean in group urban 
##              134896              148167

As we get a p-value lower than 0.01 for our test, we reject the null hypothesis, which means the average household expenditures vary from rural to urban settlements.

6. Paired Sample t-test

\(H_0\): The average household expenditures and total income in Armenia are equal.
\(H_1\): The average household expenditures and total income in Armenia are not equal.

t.test(x = X2016$monincome, y =X2016$expend)
## 
##  Welch Two Sample t-test
## 
## data:  X2016$monincome and X2016$expend
## t = 13, df = 9766, p-value <0.0000000000000002
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  33081 45270
## sample estimates:
## mean of x mean of y 
##    182366    143191

As long as we got a very low p-value close to 0, we reject the null hypothesis, which means the average household expenditures and total income in Armenia are NOT equal. Economically it means that in Armenia people either would have had savings, or they have just had extra-costs associated with the interest rates of the loans taken previously, which is more realistic than the first assumption.

7. Expenditure levels

X2016 <- X2016%>%mutate(expendLevels = case_when(expend <= 57000 ~ "Very low", expend <= 89000 ~ "Low", expend <= 300000 ~ "Medium", expend <= 500000 ~ "High", expend > 500000 ~ "Very high") )

X2016$expendLevels<- factor(X2016$expendLevels, ordered = TRUE, levels = c("Very low", "Low", "Medium", "High", "Very high"))

8. Chi-Square test

\(H_0\): Total expend levels and martial status are independent
\(H_1\): Total expend levels and martial status are not independent

chisq.test(X2016$headmerstatus, X2016$expendLevels)
## 
##  Pearson's Chi-squared test
## 
## data:  X2016$headmerstatus and X2016$expendLevels
## X-squared = 632, df = 16, p-value <0.0000000000000002

As long as we have got a very low p-value almost equal to 0 and smaller than 0.05, it means we can surely reject the null hypothesis. In other words total expend levels and martial status are NOT independent. Economically this means that martial status affects the expenditures either positively or negatively. We may assume that for example those who are married and have children should have much more expenditures related to family expenses.

9. ANOVA

\(H_0\): The means in the groups of the “Martial Status” are equal
\(H_1\): At least one mean is different.

summary(aov(data = X2016,expend ~ headmerstatus))
##                 Df         Sum Sq      Mean Sq F value              Pr(>F)    
## headmerstatus    4  2398958537366 599739634342    32.6 <0.0000000000000002 ***
## Residuals     5179 95264409897838  18394363757                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the F value of the test is very low almost equal to 0, we reject our null hypothesis, which means at least one mean in the groups of the “Martial Status” differs from others. Economically the meanings of this and the previous tests are the same.

10. Expenditures in different regions

marzbox <- ggplot() + labs(title = "Expends by Marital status", y = "Expenditures", x = "Marz") +
  geom_boxplot(data = X2016, aes(y = expend, x = marz, color = marz))

ggplotly(marzbox)